Abstract:The performance improvement of image classification network is constrained due to the reliance on spatial domain features and the neglect of the role of frequency domain features. To address these issues, two-domain feature association networks for image classification(TANet) are proposed. First, a frequency domain feature extraction(FDFE) module is designed. The Fast Fourier Transform is employed to effectively capture the frequency domain detail information and global features in the image, reduce key feature loss, enhance the representation ability of image detail information, and improve the image features extraction ability of the network. Then, the frequency domain attention mechanism(FDAM) is proposed. The multi-scale spatial domain features are taken into account and combined with the Fast Fourier Transform to extract the frequency domain information. Through FDAM, the sensitivity to image details is enhanced, and the contribution of key regions is improved. Subsequently, a two-domain feature association mechanism(TFAM) is designed to fuse the frequency domain features with the spatial domain features. On the basis of retaining spatial domain features, the frequency domain features are utilized to supplement the image detail information as well as the global features and thereby enhance the expression ability of the features. Finally, FDAM is embedded into the residual branch to learn the two-domain features of the input data more effectively. Thus, the attention between local and global information is balanced, the availability of key features is enhanced, and the capability of the network in image classification is improved. Experiments on five public datasets show that TANet enhances the image classification performance of the network by incorporating frequency domain features, extracting image detail information and global features, reducing key feature loss, enhancing the perception of important regions, and improving the expression of features.
[1] 陈宁,刘凡,董晨炜,等.基于局部对比学习与新类特征生成的小样本图像分类.模式识别与人工智能, 2024, 37(10): 936-946. (CHEN N, LIU F, DONG C W, et al. Few-Shot Image Classification Based on Local Contrastive Learning and Novel Class Feature Generation. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 936-946.) [2] ZHANG J L, REN J F, ZHANG Q, et al. Spatial Context-Aware Object-Attentional Network for Multi-label Image Classification. IEEE Transactions on Image Processing, 2023, 32: 3000-3012. [3] HE T, ZHANG Z, ZHANG H, et al. Bag of Tricks for Image Cla-ssification with Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 558-567. [4] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90. [5] SIMONYAN K, ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2024-12-15].https://arxiv.org/pdf/1409.1556v2. [6] SZEGEDY C, LIU W, JIA Y Q, et al. Going Deeper with Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 1-9. [7] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [8] ZAGORUYKO S, KOMODAKIS N.Wide Residual Networks[C/OL]. [2024-12-15].https://arxiv.org/pdf/1605.07146. [9] XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated Residual Transformations for Deep Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5987-5995. [10] GAO S H, CHENG M M, ZHAO K, et al. Res2Net: A New Multi-scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662. [11] LI X, WANG W H, HU X L, et al. Selective Kernel Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 510-519. [12] SHEN Z R, ZHANG M Y, ZHAO H Y, et al. Efficient Attention: Attention with Linear Complexities // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2021: 3530-3538. [13] LIN H Z, CHENG X, WU X Y, et al. CAT: Cross Attention in Vision Transformer // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2022. DOI: 10.1109/ICME52920.2022.9859720. [14] SUN Y H, DAI D W, ZHANG Q N, et al. MSCA-Net: Multi-scale Contextual Attention Network for Skin Lesion Segmentation. Pattern Recognition, 2023, 139. DOI: 10.1016/j.patcog.2023.109524. [15] OUYANG D L, HE S, ZHANG G Z, et al. Efficient Multi-scale Attention Module with Cross-Spatial Learning // Proc of the IEEE International Conference on Acoustics, Speech and Signal Proce-ssing. Washington, USA: IEEE, 2023. DOI: 10.1109/ICASSP49357.2023.10096516. [16] KHAN S, NASEER M, HAYAT M, et al. Transformers in Vision: A Survey. ACM Computing Surveys, 2022, 54(10s). DOI: 10.1145/3505244. [17] TOUVRON H, CORD M, DOUZE M, et al. Training Data-Efficient Image Transformers & Distillation through Attention. Proceedings of Machine Learning Research, 2021, 139: 10347-10357. [18] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002. [19] WANG C W, WU J S, FANG A Q, et al. An Efficient Frequency Domain Fusion Network of Infrared and Visible Images. Enginee-ring Applications of Artificial Intelligence, 2024, 133. DOI: 10.1016/j.engappai.2024.108013. [20] HU J, SHEN L, SUN G.Squeeze-and-Excitation Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2018: 7132-7141. [21] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2020: 11531-11539. [22] HOU Q B, ZHOU D Q, FENG J S.Coordinate Attention for Efficient Mobile Network Design // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13708-13717. [23] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19. [24] TAN M X, LE Q V.EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of Machine Learning Research, 2019, 97: 6105-6114. [25] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586. [26] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269. [27] ZHOU C L, ZHANG H, ZHOU Z K, et al. QKFormer: Hierarchical Spiking Transformer Using Q-K Attention[C/OL].[2024-12-15]. https://arxiv.org/pdf/2403.16552. [28] LAN H, WANG X H, SHEN H, et al. Couplformer: Rethinking Vision Transformer with Coupling Attention // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 6464-6473. [29] SHIN H, CHOI D W.Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(13): 14991-14999. [30] KONSTANTINIDIS D, PAPASTRATIS I, DIMITROPOULOS K, et al. Multi-manifold Attention for Vision Transformers. IEEE Access, 2023, 11: 123433-123444. [31] MA C X, WU J B, SI C Y, et al. Scaling Supervised Local Lear-ning with Augmented Auxiliary Networks[C/OL].[2024-12-15]. https://arxiv.org/pdf/2402.17318 [32] CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Re-thinking Attention with Performers[C/OL].[2024-12-15]. https://arxiv.org/pdf/2009.14794 [33] HASANPOUR S H, ROUHANI M, FAYYAZ M, et al. Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet[C/OL].[2024-12-15]. https://arxiv.org/pdf/1802.06205. [34] QIU X R, ZHU R J, CHOU Y H, et al. Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(1): 601-610. [35] WU X D, GAO S Q, ZHANG Z Y, et al. Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 16163-16173. [36] QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: Frequency Cha-nnel Attention Networks // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 763-772. [37] GUO M H, LU C Z, LIU Z N, et al. Visual Attention Network. Computational Visual Media, 2023, 9(4): 733-752.